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PEAKS IN X-RAY DIFFRACTOGRAMS 1 
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University Duisburg-Essen and Technische Universitat Dortmund 

We consider data consisting of photon counts of diffracted x-ray 
radiation as a function of the angle of diffraction. The problem is 
to determine the positions, powers and shapes of the relevant peaks. 
An additional difficulty is that the power of the peaks is to be mea- 
sured from a baseline which itself must be identified. Most methods 
of de-noising data of this kind do not explicitly take into account 
the modality of the final estimate. The residual-based procedure we 
propose uses the so-called taut string method, which minimizes the 
number of peaks subject to a tube constraint on the integrated data. 
The baseline is identified by combining the result of the taut string 
with an estimate of the first derivative of the baseline obtained using a 
weighted smoothing spline. Finally, each individual peak is expressed 
as the finite sum of kernels chosen from a parametric family. 

1. Introduction. In the analysis of the morphology of thin films, x-ray 
diffraction is an indispensable tool [Birkholz (2006)]: the intensity of diffracted 
x-rays yields important information about the crystalline structure of the 
material under consideration. The experimental data are usually obtained in 
the form of a diffractogram: photon counts of x-ray radiation are measured 
as a function of the angle of diffraction 26. 

A typical diffractogram, as shown in Figure 1, exhibits peaks as well as 
a slowly varying baseline. The physically relevant information is contained 
in the location, shape and size of the peaks and their decomposition into a 
sum of one or more possibly overlapping components represented by kernels. 
Often thin film diffractograms are analyzed using ad-hoc methods where 
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denoising, removal of the baseline and fitting of the peaks are performed 
manually. Apart from being inconvenient, this often requires knowledge of 
possible peak positions. 

In this article we suggest a new flexible automatic procedure for the anal- 
ysis of thin film diffractograms. Our aim is to separate the signal of interest 
from the noise. More specifically, we aim at a decomposition of the form 

(1) Data = Baseline + Peaks + Noise. 

Our fully automatic five-step procedure (see Section 2 below) removes the 
baseline and determines the number, positions, powers and shapes of the 
relevant peaks and their components. It can be applied when little or no prior 
knowledge of approximate peak positions is available, as is often the case in 
the analysis of the morphology of thin films. Throughout all stages of the 
procedure, we employ the following principle: among all models we choose 
the simplest one which is consistent with the data. That is, "simple" models 
are favored over "complex" models, but the definition of "simplicity" or, 
equivalently, "complexity" depends on the particular problem to be solved. 
We use three different definitions of complexity, namely, 

• the number of peaks, 

• the value of / (6) 2 d9 as a measure of roughness of the function g, 

• the number of components or kernels in the representation of each indi- 
vidual peak. 

More formally, we first construct an approximation or confidence region 
(Section 3) using special multiscale conditions for the residuals. This specifies 
the set of functions consistent with the data. Within this class we then choose 
a model with minimum complexity [cf. Davies, Kovac and Meise (2008)]. 

To carry out this program, we make use of recent advances in nonparamet- 
ric regression and denoising techniques, in particular, the taut string method 
of Davies and Kovac (2001) and the weighted smoothing splines procedure 
of Davies and Meise (2008). The taut string method reliably identifies the 
local extremes of the regression function and it is used to provide initial 
estimates of the "Peaks" component of (1). Weighted smoothing splines are 
then used in conjunction with the known positions of the peaks to provide a 
smooth estimate of the "Baseline" component of (1). Finally, we fit sums of 
Pearson Type VII curves to the identified peak intervals in order to decom- 
pose the peaks into their components and estimate the physically relevant 
parameters. What remains is the "Noise" component of (1). 

We note that the application of the proposed method is not limited to 
thin film x-ray diffractograms. With little or no modification the proce- 
dure could also be applied to other types of diffractograms, for example, of 
powders or partly crystalline fibers of various materials. A wide range of 
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spectroscopic methods yield data of a similar nature and require the unam- 
biguous and automated identification of the position and width of relatively 
sharp peaks. Other applications could, for example, include the analysis of 
Raman-, FTIR- or NMR spectra and mass spectrometry data. 

The paper is organized as follows. Section 2 gives some physical back- 
ground and a description of the data sets as well as a short outline of 
our method. In Section 3 we introduce the statistical principles on which 
our procedure is based. Section 4 contains a short description of the taut 
string method and Section 5 introduces some modifications to accomodate 
for heteroskedastic noise. Section 6 describes the weighted smoothing splines 
procedure. Section 7 is devoted to the identification of the baseline and Sec- 
tion 8 to the identification and decomposition of the peaks. Section 9 gives 
a physical interpretation of the results. Finally, Section 10 contains a short 
discussion of the complete procedure. 

2. Diffractograms. X-ray diffraction is an important tool in various fields, 
including the analysis of crystalline materials, the identification of the molec- 
ular structure of proteins and, more recently, also as means to investigate the 
morphology of thin films. When thin films are prepared on glass substrates 
they are usually polycrystalline and may even contain different crystalline 
phases. The experimental data are usually obtained in the form of a diffrac- 
togram: intensity versus diffraction angle 29. The physically relevant infor- 
mation lies in the position, the power and the half-width of the peaks. For 
the physical background of x-ray diffractometry of thin films as well as the 
interpretation of the data obtained, we refer to Birkholz (2006). The peak po- 
sitions are characteristic for the crystalline structures present in the sample. 
Small shifts of the peaks with respect to the ideal positions are often related 
to mechanical strain in the crystalline lattice arising from lattice imperfec- 
tions introduced during thin film preparation. From the peak power the rela- 
tive abundance of a specific crystalline orientation can be estimated allowing 
the determination of the texture of crystalline orientations. Such an analy- 
sis has been performed, for example, in the case of thin films of In203:Sn 
prepared by various deposition techniques [Mergel, Thiele and Qiao (2005)]. 

The half-width of the peak is related to the crystallite size and to in- 
homogeneous strain within the crystallites. These parameters are strongly 
influenced by the preparation conditions and determine to a large degree 
the optical and electrical properties of the thin films. 

Methods for the analysis of x-ray diffractograms have been developed 
mainly in the context of powder diffractometry. Although the underlying 
physical principles are the same, some care has to be taken in applying 
techniques from powder diffractogram analysis to the case of thin films. 
For a general review of the physical background required for analyzing thin 
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Fig. 1. T/ie intensity of diffracted x-rays as a function of the angle of diffraction (29). 

film diffractograms as well as a discussion of similarities and differences be- 
tween powder and thin film diffractometry, we refer to Chapter 3 of Birkholz 
(2006). 

In our laboratory practice, we have so far used an ad-hoc method to 
evaluate the x-ray diffractograms. It has proved to be adequate when the 
potential peak positions were known a priori, that is, in cases where the 
produced material was already identified [Mergel, Thiele and Qiao (2005)]. 
With this method, the baseline of the data, arising from the noise level of 
the signal channel, was taken as a piecewise linear interpolation between the 
intensity values at positions in the middle between two neighboring theo- 
retical positions. Denoising was done by averaging the data in a pre-defined 
abscissa interval. The peak position was then looked for in the vicinity of the 
theoretical positions and the shape of the peak was fitted with a Gaussian 
kernel. This method uses optimization criteria for noise that are statisti- 
cally not well founded and shape functions that are often inadequate for 
x-ray peaks. Furthermore, in the general case, the crystalline structures in 
the films are not known and, therefore, diffraction peaks can occur at ar- 
bitrary values of 26. This means that the search procedure as a whole is 
not applicable. The automatic procedure we describe below does not rely 
on a-priori knowledge of peak positions and uses more flexible models of the 
baseline, the peaks, the kernels and the noise. 

A typical data set is shown in Figure 1 . The material under consideration 
here is a thin film of In203:Sn. Although it is not obvious from the figure, 
the data, being counts of photons, are integers. A simple stochastic model 



PEAKS IN X-RAY DIFFRACTOGRAMS 



5 



for the counts at an angle of diffraction 29 is the Poisson distribution with 
mean f(26) for an appropriate function /. The noise present in the data does 
not exhibit any obvious dependencies so that a Poisson model is completely 
specified by fixing / and then taking the observations at each 28 to be 
independently distributed. For large f(28) this accurately describes the noise 
level, but for small f(26) the noise level is underestimated because of ground 
noise due to the electronics. In practice, the standard deviation of the noise 
is at least 7 which, in the Poisson model, corresponds to a mean of about 50. 
For such large parameter values the Poisson distribution can be adequately 
modeled by a normal distribution with mean and variance 50. This leads to 
the model 

(2) Y(t) = f(t) + a(t)Z(t), 0<t<l, 

where / : [0, 1] — ► R, Z(t) is standard Gaussian white noise and a(t) is a 
function of f(t). We have here followed standard practice in statistics and 
defined the model on the interval [0, 1] , whereas the actual data are defined 
for 29 in [15,85]. This should cause no problems. In thin film diffractometry 
measurements are usually taken for equidistant angles, but the method we 
propose holds also for nonequispaced measurements. 

The procedure we propose determines the number, positions, powers and 
shapes of the relevant peaks as measured from the baseline and then decom- 
poses them into a sum of kernels. It consists of five steps: 

(1) The data are approximated using the taut string method. This yields 
a first estimate of the number, the positions and the heights of the peaks. 

(2) An estimate of the first derivative is obtained from a weighted smooth- 
ing spline fitted to the original data. 

(3) The peak intervals are determined from the positions of peaks accord- 
ing to (1) and a threshold for the derivative obtained in (2). 

(4) The baseline is obtained by fitting a spline to the remaining data set 
after removing the peak intervals. 

(5) The peaks with the baseline subtracted are fitted within their respec- 
tive intervals by a sum of Pearson Type VII curves. 

3. The approximation or confidence region. We now describe the con- 
struction of the confidence region which provides the basis of our concept of 
approximation. The following is based on Davies, Kovac and Meise (2008). 
Suppose we have data Y n = {(ti, Y(ti)),i = 1, . . . , n}, < t\ < ■ ■ ■ < t n < 1 
which are generated under the model 

(3) Y{t) = f(t) + aZ(t), < t < 1. 

This differs from the model (2) only in that we here assume a constant noise 
level a. For any function g : [0, 1] — > M, we define the residuals by 

(4) r(Y n ,t i ,g) = Y(t i )-g(t i ) 
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and the standardized sums of the residuals over intervals Id {1,. .. ,n} by 

(5) w(Y n ,I,g) = —=^2r(Y n ,ti,g), 

where |/| denotes the number of points ti with i in I. For a given family ^T n 
of intervals of {1, . . . ,n}, an a-confidence region for / is given by 

(6) A n =A(Y n ,a,l n ,T n ) = lg:m&x\w(Y n ,I,g)\ < a^r n logn>, 
where r n = r n (a) is chosen such that 



(7) Pi max 



yiGln y/\I 



< ^Tnlogn = a. 



To see this, we note that if the data were generated under (3), then (7) 
implies that P(f G A n ) = a. A function g belongs to A n if and only if its 
vector of evaluations at the design points (g(ti), . . . , g(t n )) belongs to the 
convex polyhedron in IR n which is defined by the linear inequalities 



(8) 



J2(Y(t.) - g(t,)) 



< oVT n log n, Idl n . 



We mention that by using an appropriate norm [Mildenberger (2008)] A n 
can also be expressed as a ball centered at Y n . The family X n we use will be a 
dyadic multiresolution scheme as for wavelets. It consists of all single points 
[i, i], the pairs [1, 2], [3, 4], ... , the sets of four [1, 4], [5, 8] etc. and including all 
final intervals whether or not they are of this form. The procedure is therefore 
not restricted to sample sizes n which are a power of 2. The number of such 
intervals is at most 2n and this collection has proved sufficiently fine for 
x-ray diffractograms. An exception is the last step of our procedure, where 
we consider small segments of the data set separately and use the family of 
all subintervals of such a segment. The use of such a scheme I n forces any 
function g in A n to adapt to the data at all resolution levels from single 
points to the whole interval. Since the noise level a of the data usually is 
not known in advance, we derive it from the data by using 

(9) a n = Median{|Y(ti) - y(f;_i)|,2 < i < n - l}/($ _1 (0.75)\/2). 

Now An = A(Y n,cr n ,I n ,T n ) is no longer exact, but it is honest [Li (1989)] 
in that the coverage probability is now at least a [Davies, Kovac and Meise 
(2008)]. The value of r n in (7) can always be determined by simulations for 
any n and a. It follows, however, from a result of Diimbgen and Spokoiny 
(2001) on the uniform modulus of continuity of the Brownian motion that 
linin^oo r n = 2 whatever a. A much more precise result is given in Kabluchko 
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(2007). In practice, we use the default value r n = 2.5, which has proved 
satisfactory for the thin film data sets. If it is more important to capture 
all relevant peaks, even small ones, at the cost of random fluctuations being 
identified as a peak, a lower value of r n can be chosen and even calibrated 
to give a desired false discovery rate [Benjamini and Hochberg (1995)]. 

For data y n = {(U,y{U)),i = 1, . . . , n} not necessarily generated the model 
(3), we refer to A(y n ,a n ,I n ,T n ) as an approximation region. Any function 
fn G A(y n i °Ym -^n ) 7n ) will be regarded as an adequate approximation to the 
data y n . 

As mentioned in Section 2 for the thin film data, the noise level for large 
values of y(U) is of the order \Jy{U) and is consequently underestimated by 
a n of (9). At the same time, for small values of y(U), the noise is underes- 
timated by y/y(ti) but correctly estimated by a n . This leads to the model 
(2) with 



(10) a(t)=max((T, v //(t)). 

We overcome this additional complexity by first obtaining an adequate ap- 
proximation f n of the data based on the a n of (9). We then take the noise 
level at the angle of diffraction ti to be 



(11) S n (ij) = max(cr n , \Jfn{ti)). 
We must now replace (4) by 

(12) r(y n ,ti,g,T; n ) = — — — — — 
and (5) by 

(13) w{y n ,I,g,T, n ) = — = ^f(y n ,*i, g, S n ). 
The resulting confidence region A n is then given by 

A n — ^4(y n , 5] n ,X n , Tn) 

(14) 

= ^:max|?7j(y n ,I,g,S n )| < VrvJogn 

The approximation regions A n and A n include many functions which 
are of no interest. For example, all functions g which interpolate the data 
belong to both. Interest always centers on the simplest functions where the 
definition of simplicity depends on the problem at hand. To detect peaks 
we are interested in minimizing the number of peaks subject to the function 
lying in A n or A n - We accomplish this by using the taut string method 
which is described in the next section. The taut string estimate is a piecewise 
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constant function and is therefore not suitable for identifying the baseline. 
As the baseline is a slowly varying function, it can be associated with a small 
first derivative. The second concept of simplicity we use is therefore based 
on smoothness and is defined by 

(15) f V 2) (*) 2 dt. 

Jo 

This leads to a problem of quadratic programming which is not feasible for 
data sets as exhibited in Figure 1. The sample size is n = 7001 and the data 
show a high degree of local variability. We therefore use an approximate 
procedure based on weighted smoothing splines which results in a cubic 
spline. We use its first derivative to identify the baseline. 

The idea of approximation regions for nonparametric regression is implicit 
in Davies (1995), where it is based on runs of the signs of the residuals. Both 
definitions are used explicitly in Davies and Kovac (2001). Approximation 
regions based on the sums of the signs of the residuals over intervals rather 
than the residuals themselves have been given by Dumbgen (2003, 2007) 
and Dumbgen and Johns (2004). 

4. The taut string method. In this section we give a short description of 
the taut string method based on a small artificial data set. Panel 1 of Figure 
2 shows data generated under (2) with f(t) = 2.5sin(4-7rt) evaluated at the 
points ti = i/32,i = 1, . . . , 32 and with a = 1. The first step is to calculate 
the partial sums of the observations Y(ti): 

(16) 5y(ti) = -E y (^)' i = l,...,n, Sy(0) = 0. 

These are shown in panel 2 of Figure 2. We now form a tube centered on 
the cumulative sums with an upper bound U and a lower bound L defined 
by 

(17) U(U) = S Y (U) + e, 1 < % < n - 1, U(0) = 0, U(l) = Sy (1), 

(18) L(U) = S Y (U) -e, 1 < i < n- 1, L(0) = 0, L(l) = Sy(l). 

The boundary conditions U(0) = L(0) = and U(l) = L(l) = Sy(l) are 
chosen to reduce edge effects. The resulting tube is shown in panel 3 of Figure 
2. The taut string function TS is best understood by imagining a string 
constrained to lie within the tube and tied down at (0,0) and (l,5y(l)), 
which is then pulled until it is taut (cf. panel 4 of Figure 2). There are 
several equivalent analytic ways of defining this. The taut string is a linear 
spline with automatic choice of knots. Panel 5 of Figure 2 shows the knot 
locations. As an estimate ft s ,n °f /> we take the right derivative of the taut 
string, except at the last point where we take the left derivative. Closer 
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consideration shows that this can be improved. The derivative of the taut 
string has a local maximum when the taut string switches from the upper to 
the lower boundary. The value of the derivative on this section is therefore 
less than the mean of the Y- values. Thus, if we define the estimate at cross- 
over intervals as the mean of the y-values between the knots, we obtain a 
better approximation without altering the number of local extremes. The 
same reasoning applies to local minima. The function ft s ,n obtained in this 
manner is shown in panel 6 of Figure 2. The connection with the number 
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Fig. 2. Panel (1) shows some noisy sine data. Panel (2) shows the cumulative sums of 
the data. Panel (3) shows the tube derived from the cumulative sums. Panel (4) shows the 
taut string through the tube. Panel (5) shows the taut string through the tube with marked 
knots. Panel (6) shows the data with the right-hand derivative of the taut string. 
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of local extremes which explains the efficacy of the method is the following. 
Consider all absolutely continuous functions H which are constrained to lie 
within the tube. Then the derivative of the taut string fts,n has the smallest 
number of local extreme values and, in particular, it has the smallest number 
of peaks. 

This still leaves open the question of the diameter of the tube. Since e 
controls the closeness to the data, the basic idea is to start with a very large 
e which contains the integral of the mean of the data which, in this case, is 
the taut string solution f\ s n . Using the confidence region A n , we determine 
those intervals /£l„ for which \ w(y n ,I, ft s , n )\ > a n\lT n logn. For any point 
i which lies in any such interval, we reduce the diameter of the tube by a 
fixed factor q < 1 at the points tj and ij+i. The default value of q which we 
use is 0.9. A new taut string estimate ff s n is calculated and the procedure 
repeated in the obvious manner until the estimate lies in A n . The default 
version of the taut string uses a n as specified by (9), r n = 2.5 and I n as the 
dyadic multiresolution scheme defined above. The method is fully automatic 
and does not require the choice of a tuning parameter. As X n contains at 
most 2n intervals and the taut string has an algorithmic complexity of O(n), 
it follows that the whole procedure has an algorithmic complexity of order 
O(relogra) when the squeezing of the tube is taken into account. Large data 
sets with n = 10 6 and more can be processed in less than one minute. 

Panel 1 of Figure 3 shows the result of applying this procedure to an x-ray 
diffractogram. As mentioned above, this underestimates the noise level for 
large values of y(t) and may result in side lobes on the large peaks. One 
such peak is shown in panel 2 of Figure 3. We denote this initial estimate 
by fts,n and use it in the definition of S n of (11). This gives rise to the 
approximation region A n of (14) and we can now repeat the taut string 
procedure. The result is denoted by f* s n . Figure 4 shows / t * n for the data 
set of Figure 1. As it can be seen in panel 3 of Figure 3, which shows f* sn 

of the same section as fts,n i n panel 2, the side lobes have been removed 
while leaving the rest of the initial estimate unaltered. It is clear that the 
automatic taut string method as just described has produced very good 
resolutions of the peaks and it has not created peaks where none should be. 

5. Heteroscedascity of the ground noise. Although the ground noise due 
to the electronics is usually more or less constant, there are cases where it 
varies sufficiently to cause the taut string to generate additional peaks. This 
can be avoided by making a nonparametric estimate of the noise level. We 
follow the approach of Davies (2006). Consider the model 

(19) V(t) = (r(t)Z(t), 0<t<l, 

where a(t) > for all t and Z(t) is standard white Gaussian noise. Given 
measurements V n = {(U,V(ti)),i = 1, . . . , n}, < t± < ■ ■ ■ < t n < 1 generated 
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Fig. 3. Top: One de-noised x-ray diffractogram, using the constant noise estimate given 
in (9) and a section (bottom left). Bottom right: The same section for the approximation 
using the local noise estimate given in (11). 
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Fig. 4. The de-noised data of Figure 1. 



under (19), it follows that, for any interval / C {1, . . . , n}, 

(20) E^) a M*«) 2 = xf/|, 
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where, as before, |/| denote the number of points t, with i E I. We write 

(21) = | s : qchisq(l - a n , \I\) < ^ V(U) 2 / s(U) 2 < qchisqK, |/|) 

for all I El, 

where (i) I n denotes a family of subintervals of {l,...,n}, (ii) s denotes a 
strictly positive function defined on [0, 1] and (iii) qchisq(p, k) denotes the 
pth quantile of the chi-squared distribution with k degrees of freedom. For 
any given a, we may choose a n so that A^ is a universal, exact and non- 
asymptotic confidence region for cr(t). The value of a n can be determined 
by simulations. More easily, if we take I n to be the set of all intervals and 
define a n by 

(22) a n = 1 - exp(-0.5r log(n)) / yVr log(n) 

with r = 3, then it may be checked by simulations that A^ has a coverage 
probability of at least 0.95 for 500 <n< 10000. The regularization we choose 
is to take s n to be piecewise constant and then to minimize the number of 
intervals of constancy subject to s n E A°- This is a difficult optimization 
problem, but the following simplified procedure works well in practice. We 
start with the interval J± = [1, 1] and put 

We now check the inequalities defining A„, but only for all intervals I C J±. 
These clearly hold for J\ = [1,1]. At the /cth stage J\ = [1, k], we increase J± 
by including k + 1 and continue in this manner until we reach the end of the 
sample or, for some k the conditions do not hold for J\ = [l,k + 1]. We then 
take [1, k\ as the first interval of constancy and repeat the same process for 
the second interval. This procedure is continued in the obvious manner until 
the last sample point is reached. Although this does not solve the original 
optimization problem, it is clear that it gives a lower bound to the number 
of intervals of constancy. Moreover, it is not difficult to show that if a(t) 
of (19) is piecewise constant on a finite number of nondegenerate intervals, 
then the procedure will consistently estimate the number of intervals and 
their endpoints. 

To allow for heteroscedascity, we proceed as follows. First we use the taut 
string method with a constant a n given by (9) to give a first approximation 
f n . We calculate the residuals and approximate their absolute values by a 
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Fig. 5. The upper panel shows the taut string reconstruction with constant ground noise. 
The center panel shows the absolute values of all the residuals together with the piecewise 
constant approximation. The bottom panel shows the final reconstruction for the same 
section as the top panel. 



piecewise constant function s n (t) as described above. Finally we replace (11) 
by 

(23) S n (*i) = max(s n (*i), \J f n {U)) 

and then proceed as before. The upper panel of Figure 5 shows a section 
of a data set and the taut string approximation with additional peaks be- 
tween the angles 26 and 30. The center panel shows the absolute values of 
all residuals together with the piecewise constant approximation s n (t): the 
heteroscedascity is evident. The small values of s n (t) are due to two large 
peaks where the residuals are very small. The bottom panel shows the taut 
string reconstruction using (23). 

6. Weighted smoothing splines. After having determined the number 
and locations of the peaks, the next step is to identify the baseline. We 
do this by fitting a smooth function to the data and then identifying the 
baseline by the size of the first derivative. As mentioned above, ideally we 
would like to minimize (15) subject to g € A n , using f* s n in (11) or in the 
heteroscedastic case (23). As A n is defined by a series of linear inequalities 
this, after discretization, leads to a quadratic programming problem which is 
in principle solvable, but for large data sets and/or data with large variations 
in local smoothness there are considerable numerical problems. Because of 
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this we take an approach based on weighted smoothing splines which is 
as follows [Davies and Meise (2008)]. For given weights A = (Ai, . . . , A n ), we 
consider the solution of the following minimization problem: 

n i 

(24) S x (g):=J2Hy(U)-g(U)) 2 + / (g (2) (t)) 2 dt — min! 

The solution is a natural cubic spline which we denote by f wssn . The weights 
A are data dependent and chosen to ensure that f wss ,n £ A n . As the smooth- 
ness of the solution of (24) increases when the values of the Aj decrease, we 
wish to choose the weights A to be as small as possible subject to f wss ,n £ A n . 
We do this in a manner similar to that used in the taut string procedure. 
We start with very small weights Aj so that the solution is almost a straight 
line which we denote by f^ ssn - We determine those points U which lie in 
intervals / for which \w(y n ,I, f^ ss n , S n )| > \Jr n log n. At such points we in- 
crease the Aj by a factor of q > 1. The default value we use is q = 2. The 
solution f^ ss n is calculated and the procedure is continued in the obvious 
manner until the solution lies in A n . The first panel of Figure 6 shows the 
result of the weighted smoothing spline, f wss ,n, for the data set of Figure 1. 

The second panel shows the first derivative fwss,n- The smoothness of the 
solution can be seen from the third panel of Figure 6 which shows the same 
section of the data as Figure 4. 

7. Identifying the baseline. To identify the baseline, we combine the 
results of the taut string, f* s n , and the weighted smoothing spline approxi- 
mation, f wss ,n- The baseline is identified by the size of the derivative fwss,n 
of fwssn- The taut string estimate is piecewise constant, so first we iden- 
tify those intervals which correspond to the local maxima of f* sn - For each 

specified interval we find to with fwss,n(to) ~ an d to inside or close to the 
actual interval. Afterward we determine t\ 2 < tj x <to< t ri < t r2 with 

(25) \fi% n (t k )\ « \fi% n (t n )\ « Median(|/« S) J), for i = 1,2, 

and fwL,n{t) > for t E [ti 2 ,t h ] and fw]s,n(t) < for t G [t n ,t r2 ]. The ini- 
tial interval is then extended to [ti 2 ,t r2 \. The final intervals are taken as 
delimiting the peak. The maximum width of the peaks is bounded by five 
degrees 26. The remaining data set is approximated again using a weighted 
smoothing spline and the result fj,i n is the estimate of the baseline. The 
upper panel of Figure 7 shows the dataset of Figure 1 with automatically 
fitted baseline. The lower panel shows the data after the baseline has been 
subtracted. 
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Fig. 6. The upper panel shows the weighted smoothing spline estimate f wss of the data of 
Figure 1. The middle panel shows the first derivative f^ss together with the used threshold 
(dotted line) and the bottom panel shows a section of f WS s- 




Fig. 7. Baseline approximation and data with removed baseline. 
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8. The decomposition of the peaks. We now address the third problem, 
which is to decompose each peak into a finite sum of kernels. In the sim- 
plest case where the kernels are translations of one specified kernel this is a 
deconvolution problem which is known to be an ill-posed inverse problem. 
The problem we wish to solve is even more difficult, as the kernels are not 
fixed but only taken to belong to a parametric family of kernels. Apart from 
location, they may also differ in width and shape. To solve ill-posed inverse 
problems, some form of regularization is required. We regularize by mini- 
mizing the number of components subject to the condition that the result 
is consistent with the data in the sense defined in Section 3. In general, the 
solution will not be unique and stable, as there may be several solutions 
with very different kernels. We give an example in the next section. The 
exact mathematical formulation leads to a nonconvex minimization prob- 
lem with a large number of local minima, so there is no method which is 
always guaranteed to yield at least one solution with the minimum number 
of components. We use a random search algorithm which has the advantage 
of producing different solutions rather than just one. As not all solutions 
are physically relevant, a deterministic method may well produce a solu- 
tion, but one which is not physically relevant. An example of a peak with 
two essentially different solutions is shown in Figure 10 in the next section: 
No adequate approximation with one kernel is found, but there exist differ- 
ent combinations of two kernels which give adequate approximations to the 
data. 

The algorithm is as follows. First the intervals defined by (25) are treated 
separately. Let {t[,ti + i, . . . , t m } C {ti, . . . , t n } C [0, 1] be the segment under 
consideration and L := m — I + 1 its length. We denote by 



the measurements in the interval, where the baseline /&/ n has been sub- 
tracted but will subsequently be used for the standardization of residuals. 
We now construct an approximation to the data y(ti), . . . ,y(t m ) which we 
will denote by f p k,n(t)- Note that f p k,n is only defined on the peak intervals. 

In much the same manner as in the previous sections, we start with the 
simplest model, one kernel, and then check whether an adequate approxi- 
mation f p k,n{pl)i ■ ■ ■ > fpk,n{pm) t° the data exists, that is, whether the appro- 
priately standardized residuals satisfy 



for all intervals I C {ti, . . . ,t m }. We give details on the choice of the set of 
intervals, the noise level S n and the threshold Cl below, after the description 
of the procedure. Model complexity (the number of kernels) is increased until 
the criterion is satisfied. Physical characteristics of interest like power, full 



yiti) = y(U) ~ fbl,n{U) 



for i = 1, . . . ,m 



(26) 
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width at half maximum (FWHM) and exact location of the peak components 
can then be calculated from the estimated components. 
Each decomposition is of the form 

k 

(27) /(t) = I>p(t;#), 

i=l 

where k denotes the number of kernels (starting with k = 1) and ji are 
nonnegative weights. The kernels p depend on a vector of parameters in- 
cluding location and shape parameters. Depending on the parameterization, 
the weights ji correspond either to the maximum height or to the power of 
the peak component. The number and interpretation of the parameters as 
well as the range of admissible values depend on the family of curves used. 
Several choices of kernels are possible, but the most widely used families 
all include densities of the Gaussian and Cauchy (Lorentz) distributions as 
extreme cases, say [Birkholz (2006), Chapter 3]. Among these families are 
Voigt functions, which are convolutions of Gaussian and Cauchy densities, 
so-called pseudo- Voigt functions, which are convex combinations of Gaus- 
sian and Cauchy densities, and Pearson Type VII curves. The approach 
presented here is not limited to these families of curves and should work 
for any suitably chosen parametric family of kernels including asymmetric 
ones. 

In the following, we will only consider the Pearson Type VII family, since 
it works well for our data and avoids some numerical difficulties that occur 
when using Voigt or pseudo- Voigt functions. The Pearson Type VII fam- 
ily consists of affine transformations of t-distributions; see Chapter 28.6 in 
Johnson, Kotz and Balakrishnan (1995). This family has been used for fit- 
ting diffraction peaks for a long time; cf. Hall et al. (1977). 

The curves have the form 

(28) p(t;P)=p(t; t x i ,m i ,a i ) = (l + ^^ 

\ afrrii 

where fii is the location parameter, a, measures the width, and rrii > 1 
determines the shape of the curve. For rrii = 1, p is the Cauchy density, and 
since (1 + — )~ m "^5° exp(— x 2 ), the shape becomes finally Gaussian for 
large rrii. The kernel is not normalized, so it is not necessarily a probability 
density. We have fJ-i, rrii, a,i) = 1, so the weight 7$ is the height at the 
maximum. For each Pearson VII kernel, we have to estimate four parameters: 
fj,i, rrii, en and the weight 7$. 

For fixed k (starting with k = 1), we consider signals of the form 

k 

(29) /pfe,n(*) = A) + Pit + ^2~fip{t; /j>i,mi,ai), 

i=i 




18 



DAVIES ET AL. 



where p(t; /ij,mi,aj) is the Pearson VII function with parameters as de- 
scribed above. The parameters (5q and (3\ are added to allow small changes 
in the baseline estimate and should only have small values, for examples, 
values between ±do := ±5% of the height of the initial baseline estimate 
for Pq and between — d\ and d\ for f3\. We choose d\ so that the slope of 
the baseline estimate can change by at most 5 counts per 20. Thus, the 
parameter vector 

ft = (A),/3i,7i,^i,mi,ai, . . . ,~/ k , n k ,m k ,a k ) 

completely determines the shape. Since it is not possible to check directly 
whether an adequate approximation of given complexity k exists which sat- 
isfies our criterion, we focus on one or several promising candidates. As the 
estimate is required to be close to the data with small residuals, a natural 
choice is the nonlinear weighted least squares estimate, where the weights 
are the reciprocals of the scale estimates (11). In the heteroscedastic case, 
we use (23) instead. This leads to the following optimization problem: 



(30) R{ft) = £ { UkA %^ h) m ) 2 — mini, 



3=1 



with 



subject to 



i=l 

-dj<fij<dj (j = 0,l), 

7i,ai>0 (i = l,...,k), 

ti <//!<•■• <fj, k <t m (i = 1, . . . , k), 

mi > 1 (i = 1, . . . , k). 

Simple re-parameterizations can be used to eliminate the interval con- 
straints, for example, logarithms and affine transformations of the logit- 
function. For k > 1 every signal has k\ different parameterizations because 
of interchangeability of the kernels, and a reduction of the search space is 
achieved by enforcing an ordering in the location parameters jjL\ < • • • < fj, k . 
An appropriate transformation is given by Jupp (1978). 

Since (30) generally has a large number of local minima, we proceed itera- 
tively in the following manner. We choose a starting value at random from a 
uniform distribution over a suitably chosen rectangular set which contains all 
reasonable parameter values. This is followed by a Newton-type procedure 
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to find the nearest local minimum of R. We use the so-called BFGS-Method 
as described in Chapter 3.2 of Fletcher (2000). The local minimum of R is 
then compared to the lowest value previously found. If it is lower, we check 
the conditions (26), and stop if they are fulfilled. In this case, an adequate 
approximation with given complexity has been found. Otherwise, we draw 
a new starting value at random and repeat these steps. If no adequate ap- 
proximation is found within a specified number of iterations, the output is 
the best local minimum of R that has been found. The number of kernels is 
then increased by one, and the procedure is started anew. 

Note that this optimization algorithm does not directly aim at obtaining 
an adequate approximation in the sense of the criterion (26) but tries to 
find local optima of the weighted least squares residual function. This is less 
difficult as R is infinitely differ entiable. 

When checking (26), we standardize the residuals using 



instead of (11). In the case of heteroscedastic ground noise, we use the 
estimate given in (23). Since we only consider one segment of the data at a 
time, we have a much smaller number of observations. This allows us to use 
all subintervals of ti, . . . ,t m in (26). We use an efficient algorithm given by 
Bernholt and Hofmeister (2006) for this. The value of Cl is determined by 
means of simulation, since the asymptotic choice given in Section 3 is not 
valid for small sample sizes. 

It is in the nature of this problem that multiple solutions may exist, 
especially when fitting two or more kernels. If only one solution is required, 
then the process could be terminated here. However, in the context of thin 
film data, some solutions may be physically relevant and others not. For this 
reason, even after a solution has been found, the process is repeated a fixed 
number of times set by the user. This provides some idea of the variability of 
possible solutions for this particular segment of the data. In some cases these 
will be very similar, but they might also differ strongly. An example of this 
is described in the next section. The experimenter may then either choose 
the solution that is the most meaningful, based on partial prior knowledge 
about possible components of the material under consideration or on the 
results for the other peak intervals, or decide that no physically meaningful, 
unambiguous interpretation of this part of the data is possible. 

Once a solution is found, the characteristics of the peak components can 
be estimated by calculating the values for the fitted curves. For Pearson VII 
curves as used here, the corresponding weight parameter ji equals the max- 
imum height. The integrated intensity ij of the ith component is obtained 




by 



r(m« - 1/2) y/TTiriiai 
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cf. Hall et al. (1977). The full width at half maximum of the ith kernel 
depends only on the shape and scale parameters rrn and Oj, and can be 
calculated explicitly by 



cf. Hall et al. (1977). Of course, Ii and FWHMi must be scaled appropriately 
according to the grid width. If an interval contains two or more strongly 
overlapping peak components, or if the components have very low intensities, 
the values calculated may not be reliable. 

9. Physical interpretation of some results. The diffractogram presented 
in Figure 1 was taken on a thin film of In203:Sn, that is, indium oxide 
doped with tin. This material is usually called ITO (indium-tin-oxide). It 
has a good electrical conductivity and is transparent in the visible wave 
length region. Its main application is as top electrode in liquid crystal dis- 
plays (LCDs) or in antireflective coatings on TV monitors [Mergel (2006)]. 
X-ray diffractograms of such films contain important information on the ma- 
terial's quality, as it was shown that the optical and electrical properties are 
correlated with the lattice expansion that can be deduced from the position 
of the peaks [Mergel and Qiao (2004)] . 

The thin film consists of many small crystalline grains with different 
crystallographic orientations and the experimental conditions for taking the 
diffractograms are those of powder diffraction [Smart and Moore (2001)]. X- 
ray reflexes are observed when a set of grains satisfies the Bragg condition 
for a particular diffraction angle 



where d is the distance between the lattice planes of the grains parallel to 
the sample surface, 9 is the diffraction angle and A is the wave length of the 
x-ray radiation used in the diffraction apparatus. To give an example, the 
peaks in the diffractogram of Figure 1 at about 30.4° and 35.4° are assigned 
to ITO grains with orientation (222) and (400), respectively. The numbers 
in parentheses designate the Miller indices. 

The Miller indices are characteristic of the orientation of the specimen. 
For crystals with a cubic lattice, the Miller indices give the vector normal to 
the planes, (2,2,2) and (4,0,0) for the examples given above and the distance 
between the planes is obtained from 



where ao is the lattice constant, that is, the lateral extension of the unit cell 
of the crystal and h, k, I are the Miller indices, (hkl). 




(31) 



2dsin(0) = A, 



(32) 



dhkl 



Vh 2 + k 2 + l 2 ' 



PEAKS IN X-RAY DIFFRACTOGRAMS 



21 



6 - 



E 

e 



i 



2 



♦ 

i-c 



* single 
□ major 
i minor 



222 400 420 440 611 
Lattice planes (Miller indices) 

Fig. 8. Lattice distortion of the peaks represented in Table 1. Only a lattice expansion is 
observed. The full symbols designate the results of peak fits with one center and the open 
symbols those where two centers have been used. 



All possible values of d^ki of an ideal cubic crystal can be calculated from 
equation (32) when the lattice constant is known. In the case of 1^03, the 
basic substance of ITO, ao = 1.0118 nm. From (i/^-values the positions of 
all diffraction peaks can be calculated using the Bragg condition (31). Vice 
versa, the values for d can be deduced from the position of the diffraction 
peaks and assigned to (hkl) planes. This has been done for the peaks of 
Figure 8. 

We do not expect ideal crystals in our sample because the thin film has 
been deposited under nonequilibrium conditions and has been subjected to 
ion bombardment during film growth. Therefore, it is interesting to evaluate 
the deviation of the lattice constants of the grains from the ideal value. 
This deviation is characterized by the lattice distortion (d — do)/do = Ad/do, 
where d denotes the experimentally found value and do that of the ideal 
crystal calculated from equation (32) with the ideal lattice constant. In 
the case of ITO, the lattice distortion is different for the different grain 
orientations [Mergel et al. (2000)]. 

The results of the evaluation of some peaks of the sample are displayed 
in Table 1 in order to discuss the importance of the statistical analysis for 
physical problems. The lattice distortion for these peaks is depicted in Figure 
8. 

The FWHM is correlated with the quality of the crystalline grains. A large 
FWHM indicates small grains or a large density of lattice defects that are 
both responsible for the reduction of the correlation length of the crystalline 
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Table 1 

Evaluation of some peaks of the diffractogram of Figure 1. The first three columns 

designate the Miller indices (hkl), the peak position and the lattice distortion, 
respectively. The last column indicates whether the decomposition is accepted by the 

residual criterion (26) 





29 


Ad/do [%o] 


Height 


Intensity 


FWHM 


m 


Accepted 


(222) 


30.42 


5.1 


324 


96 


0.27 


7.2 


Yes 




35.34 


3.1 


1548 


288 


0.15 


2.1 


No 


(400) 


35.33 


3.5 


1614 


250 


0.12 


1.5 






35.43 


0.8 


487 


51 


0.07 


1.0 


Yes 


(420) 


39.66 


3.5 


38 


11 


0.18 


1.0 


Yes 




50.75 


4.9 


460 


170 


0.34 


6.4 


No 




50.68 


6.2 


131 


33 


0.16 


1.0 




(440) 


50.77 


4.6 


378 


144 


0.35 


8.5 


Yes 




50.71 


5.6 


420 


129 


0.27 


3.2 






50.89 


2.3 


155 


62 


0.27 


1.1 


Yes 


(611) 


55.76 


3.5 


119 


39 


0.29 


3.5 


Yes 



lattice. The (222)-oriented grains exhibit a big lattice expansion together 
with a big FWHM. This correlation is physically meaningful because both 
effects originate from a strong incorporation of additional oxygen into the 
lattice leading to an expansion of the lattice and a big density of lattice 
defects [Mergel et al. (2000)]. The fit of the peak mentioned in Table 1 is 
shown in the first row of Figure 9. 

The (400)-oriented grains are characterized by a small lattice expan- 
sion and a small FWHM. This is consistent with literature where it is re- 
ported that this orientation is most resistant against bombardment with 
oxygen and, therefore, incorporates less oxygen than the other orientations 
[Mergel et al. (2000)]. The residual criterion excludes the solution with one 
single peak but accepts the two-peak solution reported in Table 1 and Fig- 
ure 8. This comprises a minor contribution with a lattice constant close to 
the ideal one indicating that there exists a group of crystalline grains that 
did not incorporate any additional oxygen at all. Such grains may exist in a 
certain depth of the film where ideal growth conditions have been met. The 
fit with one kernel as well as the decomposition into two kernels is shown in 
the second and third rows of Figure 9, respectively. 

The situation is more complicated for the (440)-oriented grains as shown 
in Figure 10. The single-peak solution is discarded by the residual criterion 
(26). There exist at least two equally acceptable two-peak solutions, one with 



PEAKS IN X-RAY DIFFRACTOGRAMS 



23 




34 35 4 36.0 35.0 34.4 34.0 

Fig. 9. Fit for the peaks corresponding to (222) -oriented grains (first row) and fits with 
one and two kernels for the (400)-oriented grains (second and third rows). The left column 
shows the data and the resulting fits, the right column shows the individual components. 

the major peak at the leading position and the other one with the major peak 
at the trailing position. The first case yields two different FWHMs (0.35 and 
0.16), similar to the two-peak solution for (400), whereas the second case 
yields two similar FWHMs. A priori, there is no physical reason to prefer 
one solution over the other and further investigation is needed to decide. 

The intensity of the peaks is compared to that observed for ideal powder 
samples. In our case it is found that the (400) grains are much more and 
(222) grains much less abundant than expected for an equal distribution. 
This is an indication of a strong ion bombardment during deposition to 
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50 50,5 51.0 50.0 50 5 51.0 




50.0 50.5 51.0 50.0 50 5 51,0 



Fig. 10. Fit for the peaks corresponding to (440)-oriented grains. The fit with one kernel 
(first row) is rejected, but there exist at least two different fits with two kernels that are 
accepted by the criterion (second and third rows). The left column shows the data and the 
resulting fits, the right column shows the individual components. 

which (400) planes are not sensitive and which leads to a strong oxygen 
incorporation in (222) grains hampering their growth. 

10. Discussion. In this article we have proposed a fully automatic five- 
step procedure that determines the number, positions, powers and shapes 
of the relevant peaks and their components in x-ray diffractograms. It can 
be applied when little or no prior knowledge of approximate peak positions 
is available, as is often the case in the analysis of the morphology of thin 
films. The procedure is based on recent advances in nonparametric regression 
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and denoising techniques, the taut string method and weighted smoothing 
splines. The taut string method is very successful in producing approxima- 
tions with a small number of local extremes and is therefore used in step 
one to determine the positions of the peaks. As the approximation is piece- 
wise constant, it cannot be used for the detection of the boundaries of the 
peaks, which are necessary for a separate fit of the baseline. This problem 
is solved using weighted smoothing splines — used in step two — to give an 
approximation to the data which is twice continuously differentiable. The 
peak locations derived from the taut string and the first derivative of the 
fitted spline are used in step three to determine baseline and peak regions 
of the data set. The threshold of the first derivative used to define the base- 
line is obviously dependent on the problem under consideration and cannot 
be decided purely on statistical principles. In step four of the procedure 
weighted smoothing splines are again used to estimate the baseline which is 
required to determine the power of the peaks. These first four steps of the 
procedure are computationally fast and were carried out in about 1 minute 
on a standard PC (AMD Athlon 64 X2 Dual Core 6000+, 3.01 GHz, 3.21 
GB RAM) for the data set used as an illustration throughout the article 
(n = 7001). 

The last step of the procedure — decomposition of the peaks — requires 
the solution of a nonlinear least-squares problem. Here multiple solutions 
may occur, especially when more than one kernel is fitted. A stochastic 
search algorithm is used to find several candidate solutions. It is left to 
the experimenter to decide which of these multiple solutions are physi- 
cally relevant. Calculation times may vary for this step depending on the 
number of intervals identified in the previous steps, the number of kernels 
needed to decompose the peaks and the number of alternative solutions 
desired. Most of the calculation time is spent attempting to fit a model 
with a smaller number of components before an upper limit of iterations 
is reached and the number of components is increased; in some cases this 
can take several minutes. However, if there is a solution with one kernel, it 
is usually found within the first few iterations. For the data set presented 
here, there were 25 intervals, 19 of which consisted of 1-component peaks. 
Calculation of the peak decompositions took slightly more than 2 minutes 
for step five, adding up to a total computation time of about 3.5 min- 
utes. 

Software. The procedure and the data set analyzed in this article are 
available in the R-package dif f ractometry on CRAN. It can be downloaded 
from http : //www. cran. r-project . org/. 
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